Experiences of Machine Vision Internship At FlyInstinct

experiments

Context

Internship of Louis Geisler - Data Scientist

AI experimences

GAN

Main idea of the AEGAN (Auto-Encoder Generative Advezrserial Networks)

The point of using AEGAN is to switch the MSE metric that produce blurry image it that mainly focus on the global looking.

This is why using a discriminator, a network that will only focus of finding details that differentiate a true image from a generated one, will produce a better autoencoder.

Experience report

  1. Experience: I implement GAN for our use case.

  2. Experience: I switch to WGAN-GP t improve stability.

  3. Experience: Test a model with many channels to see if it can output a sharp image and still remove the FODs.

  4. Experience: Test a new autoencoder architecture with a reiszing layer as encoder and a super-rosultion model as decoder.

  5. Experience: Test the autoencoder model with near cameras images.

Result Summary

The autoencoder works pretty well, and because they are unsupervised, they should be able to detect any kind of FOD.

But one of the drawbacks is that they output blurry image.

Next step

One of the solution to improve autoencoder is to change the metrics: instead of using MSE, use an Autoencoder GAN (AEGAN)

experience_1

Information

Test an Autoencoder-GAN to try to generate sharper image

  1. I first implement the GAN example of keras and make it works.
    Results: The GAN is pretty slow and the output images aren't that great

  2. Then I replace the Generator by an autoencoder.
    Result: It works, but there is no longer a contraint to force it to generate images similar to its input

  3. I replace the discriminator by a pretrained keras image classifier model Xception Result: It totally failed, so I keep the default discriminator

Discriminator Generator

Results

It very hard to make this GAN works well, in many case I had 'mode collapse' issue, where one of the agent was far stonger that the other one, and so, the loss was explosing.

Also, the quality of the output image aren't that great... You can see the evolution of the images.

It also highlight that we need a spoecial model to force the autoencoder to output an image similar to its input

Conclusion

To overcome this stability problem, I will try to implement a WGAN-GP in the next experiment.

Next step

Trying implement WGAN-GP.

experience_2

Goal

Test WGAN-GP using the Keras implementation example.

I just change the output activation function, so instead of tanh, I try sigmoid and also relu1 (relu1 is some I try, it similar to relu6, in [0, 1] it is linear and become constant outside.)

Results

WGAN-GP prove to be far better than GAN, it converge faster and didn't seems suffer from "mode collapse".

Test sigmoid:

Evolution

Test relu1:

Evolution

Conclusion

Overall, WGAN-GP is far better and faster than GAN, so I will keep using it. But it also mathematically harder to understand...

There also seems to have a small problem with image when use sigmoid, but I didn't understand why... So I will keep using relu1

Next Step

Trying to convert the WGAN-GP as AE-WGAN-GP.

experience_3

Information

I will now try to convert the WGAN-GP to an Autoencoder-WGAN-GP.

To do so, I will need to replace the generator by an autoencoder and de discriminator by a new one.

The discriminator will follow this logic: The discriminator have two image inputs, and we will feed him with a real image R and a autoencoder(R) image A but in a random order: it will either be (R, A) or (A, R). We can associate a number to these combinaisons, let's say (R, A) => 1 and (A, R) => 0. The discriminator will output a number between [0, 1], 1 if it thinks that its given inputs were in the order (R, A) and 0 for (A, R). The goal of the autoencoder will be to fool the discriminator to output 0.5 => the discriminator can't differentiate between A and R images.

But because we didn't anymore use GAN model, but WGAN-GP model, it become far harder to implement it. And also, because Keras is a kind of compilated language, find the errors is trully hard.

Results

I test a new discriminator with 2 images as input and it works well, as it is able to differentiate between images fliped Left-Right, I conclude that it works very wells.

But the GAN didn't work well:

Epoch Evolution through epoche Target
0
2
9

Conclusion

The AE-GAN didn't seems to works, but I didn't understand why...

Next step

I will retry all the implementation starting from the beginning and see if I can get better results.

VQ-VAE

Goal:

Test the potential of VQ-VAE to detect anomaly folowing the idea of this scientific paper and using the reference implement of VQ-VAE available on the keras web site.

Setup

The VQ-VAE is trained with just a letent dim of 1 and 25 embeddings

data_variance = np.var(dataset["train"]["image"]/255)
vqvae_trainer = VQVAETrainer(data_variance, latent_dim=1, num_embeddings=25)
vqvae_trainer.compile(optimizer=keras.optimizers.Adam(lr=1e-2))

Results

Example of a training:

Epoch 1/10
442/442 [==============================] - 7s 16ms/step - loss: 2.6262 - reconstruction_loss: 0.1329 - vqvae_loss: 2.3598
Epoch 2/10
442/442 [==============================] - 7s 16ms/step - loss: 0.0318 - reconstruction_loss: 0.0285 - vqvae_loss: 0.0024
Epoch 3/10
442/442 [==============================] - 7s 16ms/step - loss: 0.0308 - reconstruction_loss: 0.0307 - vqvae_loss: 0.0015
Epoch 4/10
442/442 [==============================] - 7s 16ms/step - loss: 0.0279 - reconstruction_loss: 0.0271 - vqvae_loss: 0.0012
Epoch 5/10
442/442 [==============================] - 7s 16ms/step - loss: 0.0276 - reconstruction_loss: 0.0270 - vqvae_loss: 0.0011
Epoch 6/10
442/442 [==============================] - 7s 16ms/step - loss: 0.0263 - reconstruction_loss: 0.0258 - vqvae_loss: 0.0010
Epoch 7/10
442/442 [==============================] - 7s 16ms/step - loss: 0.0284 - reconstruction_loss: 0.0266 - vqvae_loss: 9.6289e-04
Epoch 8/10
442/442 [==============================] - 7s 16ms/step - loss: 0.0247 - reconstruction_loss: 0.0259 - vqvae_loss: 9.1901e-04
Epoch 9/10
442/442 [==============================] - 7s 16ms/step - loss: 0.0291 - reconstruction_loss: 0.0265 - vqvae_loss: 9.7684e-04
Epoch 10/10
442/442 [==============================] - 7s 16ms/step - loss: 0.0307 - reconstruction_loss: 0.0272 - vqvae_loss: 0.0011

As we can see, the loss (= total loss) didn't always decrease and it is not overfitting because this model didn't use validation data.

Exemple of images

Image type Input Prediction Difference
Runway
Random

The result of the random image prediction is truly suprising me.

Problem

I didn't fully grasp how VQèVAE trully work. It seems to be a second training step, very important with PixelCNN but again, I fail to fully understand what it was all about.

Conclusions

VQ-VAE didn't generalise the Runway concept, because as we can see, it didn't remove the FODs on its prediction.

It seems more like it can reproduce any kind of images

autoencoders

Idea

I want to use an autoencoder as an unsupervised mean to make the model focus on what is a runway, and only that.

Hopefully, whan that model will see FOD on the Runway, because it has never see that, it will try to make sens of it be remplacing it by a runway.

And at the end, we just subtract the input image to the output image to locate precisely the differences.

In truth, there is a second model diff2mask that would be called after the diff image of the autoencoder, to get a true mask as output, and not just a difference that it's quality is hardly quatifiable...

Diff2Mask can be the UNET model or just a dozen or so of convolution layer.

It is the 'more dangerous' part of the learning, as we switch from an unsupervised learning to a supervised learning, so the quality of the small model depends greatly of the quality of the generated FODs.

Experience report

  1. Experience: Test how sharp an image can be output by an autoencoder.

  2. Experience: Test what is the best regularisation layer between: Dropout, SpatialDropout, BatchNormalisation, None.

  3. Experience: Test a model with many channels to see if it can output a sharp image and still remove the FODs.

  4. Experience: Test a new autoencoder architecture with a reiszing layer as encoder and a super-rosultion model as decoder.

  5. Experience: Test the autoencoder model with near cameras images.

Result Summary

The autoencoder works pretty well, and because they are unsupervised, they should be able to detect any kind of FOD.

But one of the drawbacks is that they output blurry image.

Next step

One of the solution to improve autoencoder is to change the metrics: instead of using MSE, use an Autoencoder GAN (AEGAN) (see GAN Experiments)

experience_1

Goal:

Test how sharp the output of an autoencoder can be.

Experience

Train an autoencoder on 1000 epochs with 150 channels with 4 layers of deepness:

For this experience I tried to use Keras Hyperparameters Tuner to fune tune the model,

Results

Runway image

Input:

Output:

Random image

Input:

Output:

Conclusion

We can conclude that the autoencoder can output good enougth image if it as enougth channels and epochs. But this autoencoder didn't generalise the runway, but instead is able to copy any image.

To solve that problem, we may introduce dropout, or reduce he number of channels or reduce the number of hidden layer.

Next Step

So in the next experiences will try to see how improve the generalisation of the runway concept.

experience_2

Goal:

Test which is the best regularisation layer: Dropout, SpatialDropout, BatchNormalisation, None

Setup

Train four different autoencoder on 50 epochs, with a batch size of 10 and with 50 channels with 3 layers of deepness.

(These models will only have 50 channels, because 150, as the previous model is very likely to crash as it is very big in the CPU memory)

Results

  • BatchNormalisation: val_loss: 0.0040, converge quickly but didn't improve much. Had kind of "over-exposure effect", and didn't truly generalise the runway because as we can see on the random image, the black orthophoto area aren't black with a random picture.

  • None: converge very very quickly and even without regulation layer didn't overfit. It also kept the black areas, so it has a good generalisation of a runway. And it get the better accuracy: 0.0031

  • Dropout: val_loss: 0.0033, get a good validation score, was able to 'remove' a FOD of a runway and had already generalise the runway as we can see black area are kept on the random image prediction.

  • SpatialDropout: is pretty slow to converge and only get 0.0042 of val_loss

Conclusions

I will keep no regulation and if there is a suspicion of overfitting, I would use the dropout layer. It also seems there is a trade-off to find between sharpness and generalisation ability.

Next Step

Because no regularisation layer seems the best option, the next step will be to train a model with no regulation layer, on a long time and with a number of channels between 150 et 50

batchnormalisation

Information

Model trained on 50 epochs with 50 channels and 4 layers with batch normalisation layer

Results

Training

The batchnormalised model converge very quickly (alomost 5 epochs), as we can show on its history training:

Html version: training history

Best validation score: 0.0040

Quality

Runway image

Input:

Prediction:

Random image

Input:

Prediction:

Problem

We can see that there is a kind of 'overexposure' on the predicted runway, on the white line, near the bottom.

Conclusion

Because of that 'overexposure' problem, we should not use batch normalisation.

dropout

Information

Model trained on 50 epochs with 50 channels and 4 layers with dropout layer of 0.2

Results

Training

The model converge very quickly (alomost 23 epochs), as we can show on its history training:

Html version: training history

Best validation score: 0.0033

Quality

Runway image

Input:

Output:

Random image

Input:

Output:

none

Information

Model trained on 50 epochs with 50 channels and 4 layers with no regularization layer

Results

Training

The batchnormalised model converge very quickly (alomost 5 epochs), as we can show on its history training:

Html version: training history

Best validation score: 0.0031

Quality

Runway image

Input:

Output:

Random image

Input:

Output:

spatial_dropout

Information

Model trained on 50 epochs with 50 channels and 4 layers with no regularization layer of 0.2

Results

Training

The batchnormalised model converge very quickly (alomost 5 epochs), as we can show on its history training:

Html version: training history

Best validation score: 0.0042

Quality

Runway image

Input:

Output:

Random image

Input:

Output:

experience_3

Goal

Model trained on 1000 epochs with 90 channels and 4 layers with no regulation layer

(Early stopping with a patience of 50, stop when the validation loss increase or didn't improve on 50 epochs)

This time, I will also add a new small model to convert the difference image of the input and output of the autoencoder unto a mask, so let's call that model diff2mask. One of the possibility for that diff2mask model would be to used UNET architecture, but because it UNET is a very big model, using it will be more like doing classification than unsupervised learning. So I prefer use a very small model of ~5 convolutionnal layers to just differentiate noise from real FOD, and that's all.

Results

Training

History training:

Html version: training history

Best validation score: 0.0020

Quality

Image type Input Output
Runway
Random

Exemple on the real FOD

Step Image 1 Image 2
Input
Ouput
Difference
predicted mask
bin predicted mask
target mask

Conclusion

We can see that the image is still very blurry, this issue is cause by:

  • The loss function (MSE) that didn't account for the sharpness of the image, one of the sulution may be to use a GAN.
  • The structure isn't godd enougth, it may be better by adding more channels (as the experiment 1 show 150 channels output a clear image, but is no longer able to remove FOD), or switch to a VQ-VAE that is know to generate sharper image.

We can also see that because the output isn't sharp enought, the model isn't good to detect real FOD, be falsely detect ends of whitleines and spotlights as FOD.

Next Step

  • Test GAN or VQ-VAE to improve sharpness of image.
  • Try another kind of autoencoder with a resizing layer as encoder and a super resolution model as decoder. (Experience 4)

experience_4

Goal

Test if using a resizing layer as encoder part and a super resolution as decoder part (see keras example implementation) may be better than a typical autoencoder.

We resize the image 608x560 => 152x140 (four times smaller)

Results

Training

History training:

Html version: training history

Best validation score: 0.0020

Quality

Image type Input Prediction Difference
Runway
Random

Conclusion

We can see it didn't work that well, as we can see that the black and white dot didn't really disapear...

experience_5

Information

As ask by Tao, I test an autoencoder model with Near cameras images.

We keep the same architecture as the experience 3 (so a normal autoencoder, not a decoder with a resizing layer and a super resolution model as decoder).

We will see if the autoencoder can overcome the car shadow problem.

(Early stopping with a patience of 50, stop when the validation loss increase or didn't improve on 50 epochs)

Results

Training

History training:

Html version: training history

Best validation score: 0.0053

Quality

Image type Input Output
Runway
Random

Exemple on the real FOD

Step Image 1 Image 2 Image 3
Input
Ouput
Difference
predicted mask
bin predicted mask
target mask

More example here

Conclusion

We can see that the autoencoder is able to to keep the car shadow. We can even see that it generate a part of this shadow on the random image.

We can see that autoencoder seems to have a good potential to detect FOD event with and without car shadow.

We can also see that the autoencoder + diff2mask model is able to detect real FOD, so I think this model may be used in production.

But I think we should train the model on a dataset with more different shape of shadow, because as we can see on the random image, the autoencoder seems directly output the complex part of the shadow.

Next Step

Next step, make a clean version of that experience.

maskRCNN_to_UNET

Goal

Because Mask-RCNN is no longer maintain and that UNET get better perfoermance while being simpler

Experience report

  1. Experience: We just try a first a implementation of UNET to compare it to Mask-RCNN

  2. Experience: I switch to WGAN-GP t improve stability.

  3. Experience: Test a model with many channels to see if it can output a sharp image and still remove the FODs.

  4. Experience: Test a new autoencoder architecture with a reiszing layer as encoder and a super-rosultion model as decoder.

  5. Experience: Test the autoencoder model with near cameras images.

Result Summary

The autoencoder works pretty well, and because they are unsupervised, they should be able to detect any kind of FOD.

But one of the drawbacks is that they output blurry image.

Next step

One of the solution to improve autoencoder is to change the metrics: instead of using MSE, use an Autoencoder GAN (AEGAN)

experience_1

Use UNET to replace Mask-RCNN

Mask-RCNN is too old and there is hardly any up-to-date model to be find online. So I try to replace it by UNET.

Issue

Because the dataset was to big to be load in memory, I only train the model on a sample of the original dataset

Results

Here is the confusion matrix of the UNET segmentation:

At first look, we may think that UNET isn't that great to detect Spotlights. But I think it would be a missinterpretation to think so, because, in my opinion, the problem is the loss function (Sparse Binary Cross Entropy which insn't weighted to focus more on the spotlights than on the backgrounds or whitelines.

One should also becarefull because that model tend to overfit very quickly, so you have to stop it manually or use a checkpoint to save the model for each steps.

It may be caused by the fact that the training and testing sets aren't sample from a same dataset. Indeed, the images in each sets seems different in proportion and in aspect.

But we have to get the same dataset on which mask-RCNN xas train to maque a first comparaison.

Example of segmentation of train:

Input Prediction Binarezed prediction Target

Example of segmentation of test:

Input Prediction Binarezed prediction Target

Conclusion

UNET seems to be a very good successor to mask-RCNN, but we may need a weighted loss function or train less the model or to train on the full dataset. Training the model to focus only on two classes may also improve performance.

Next Step

As asked by Tao, we will split the model in two: one for spotlights, one for whitelines. And I will also change the methode to load dataset, with a dataloader loading reading in live the images write in the hard drive.

experience_2

Goal

To train the UNET Model on the full dataset, requiert to change the loader of the dataset. Instead of loading a full dataset in a numpy array, I create a custom Keras data generator that will load on the fly the images need for a specific batch training.

We also decided to create two models, one to segment spotlights and the other one to segment whitelines.

Results

We get a binary cross entropy loss of 0.0014 on the training set and 0.0021 on the test set.

So, we get a pretty good confusion matrix on the test set:

Spotlights:

White lines:

Note:

It is important to note that these score are by pixel and not by object.
0.7639 didn't mean that 76% of the spotlights are rightfully categorised, but it means 76% of the pixels of the spotlights class are rightfully categorised. Giving that the annotation use ellipsis that usually is bigger that the spotlight without a real logic, it is pretty normal that we didn't get a perfect 'score'.

For exemple:

Input Prediction Binarezed prediction Target

We can see that the shape of the annotated spotlight on the target image have a different shape, even though they are exactly the same, and we can see that the predicted shape is near of the target, but it is that small imprecision that decrease the classification score.

Next step

We will see if we can imrpove these score by removing bad image and bad annotation from the dataset

Edit

There was a problem with the calculation of the confusion matrix.

Because of some memory problem, I wasn't able to use the sklearn function to computate the confusion matrix on all the dataset in one time, so I compute CM on small batch and then calculate the average of these CMs. But because the mean of means is different from just a mean on a sample of element, the result of the confusion matrix were wrong.

So, the new confusion matrix for spotlight is this one:

We can see that the spotlight classification result drop by ~0.07.

Edit 2

I create a new metric for the confusion matrix, and instead of by-pixel results, this will be by-object result. The new metric work like this: for both target and prediction mask, we label connected-components (connected-pixels) and we calculate their barycenter. Then we match the nearest groups of the target and prediction mask, with a tresholde of 10 pixel (if their centroid are distant of more than 10px, we reject that match

So, it is normal that there is 0.0 in Nothing/Nothing, and it will always be. And spotlight/nothing and nothing/spotlight aren't equal because the number of groups on target and prediction masks may be different.

It is a bit harder to interpret.

experience_3

Goal

As asked by Tao, I just remove bad images and annotation from the test set, and I will check what happend to the score, and try to explained what it happend.

Results

Spotlights

Previous confusion matrix:

New one with bad anotation remove from the test set:

We can see that supprisingly it decrease the results of the fod detection.

The explanation is that on the bad images there were a lots of 'easy spotlight detection'.

That why you may think that the capacity of the model is decrease, but it is not, it just that it evalutation is more reliable that the previous one and fit better to the reality.

Example of bad image:

Image Annotation

As you can see, the overexposure mainly impact the white line and one may even say it helps to highlight spotlights.

White lines

Previous confusion matrix:

New one with bad annotation remove from the test set:

We can see that indeed white lines detection improve a lots. So I think this new test set is really better to test the performance of the model in real conditions.

experience_4

Goal

Now, we will train a new model on cleaned dataset and see if it improve the results and draw conclusions.

Results

Spotlights

Previous confusion matrix:

New one with bad anotation remove from the test set:

We can see that supprisingly it decrease the results of the fod detection.

The explanation is that on the bad images there were a lots of 'easy spotlight detection'.

That why you may think that the capacity of the model is decrease, but it is not, it just that it evalutation is more reliable that the previous one and fit better to the reality.

Example of bad image:

Image Annotation

As you can see, the overexposure mainly impact the white line and one may even say it helps to highlight spotlights.

final

Goal

Now, we will train a new model on cleaned dataset and see if it improve the results and draw conclusions.

Results

Spotlights

Previous confusion matrix:

New one with bad anotation remove from the test set:

We can see that supprisingly it decrease the results of the fod detection.

The explanation is that on the bad images there were a lots of 'easy spotlight detection'.

That why you may think that the capacity of the model is decrease, but it is not, it just that it evalutation is more reliable that the previous one and fit better to the reality.

As you can see, the overexposure mainly impact the white line and one may even say it helps to highlight spotlights.

save_big_model

Final result

About the Dataset:

We use the old Mask-RCNN dataset, that is in Coco format. But because Coco format take a long time lo load and take a lots of memory, I chose to preprocess this dataset to make a new one. This new one keep the size of the image 512x512x3 and link an image mask for each image.

The image and the mask are linked by thier number, the first part of the file names, and differentiate by the ending 'image' for the images, and 'mask' for the masks.

There was also a problem with the spotlight annotations that was too bing, so I reduce their radius by two during the preprocessing.

It is preprocessed by the script preprocess_coco_runway_dataset.py.

The dataset is already split into train and test set. This is inherit from the old dataset. I keep it like this to make there mask-RCNN and UNET were both trained on the same data, before making comparaisons.

The datset has 3 classes:

Class Number
Background 0
Whitelines 1
Spotlight 2

But for the mask images, I multiply that number by 255 // (3-1) = 127 and so, you can clearly see tha mask, just by opening the image with any image viewer.

There is 263 pairs of (image, mask) in the test set and 3466 pairs of (image, mask).

Note: The dataset has poor quality annotations, sometimes whitelines are missing, sometimes spotlights, sometimes there is side effect on the border, you can see small whitelines annotation in the black area of the orthophotos. I don't know what is the root of that problem, but I don't think the problem is in the preprocessing script, because I just copy exactly the same functions of the original mask-RCNN notebook.

The images of the dataset alos have a very poor orthophotos, so the images are not in a 2D plane.

Bad annotations impact very badly on the segmentation performance, so I make a list of the bad annotations, and I also create a bad_images folder when I move the anormal images (overexposed, car in, people in, etc).

def srange(a, b):
    return list(map(str,range(a, b + 1)))

set_bad_annot_train_spotlight = set(
    srange(191, 194) +
    [
        '196',
        '204',
        '205',
        '572',
        '872',
        '929',
        '1200',
        '1409',
        '1427',
        '1701',
        '1938',
        '1993',
        '2048',
    ] + 
    srange(2073, 2082) +
    [
        '2216',
        '2217',
        '2218',
        '2219',
        '2220',
        '2233',
        '2250',
        '2251',
        '2252',
        '2261',
        '2491',
        '2513',
        '2580',
        '2677',
        '2740',
        '2746',
        '2751',
        '2756',
        '2961',
        '2959',
        '3028',
        '3036',
        '3098',
        '3227',
        '3342',
        '3501',
        '3502',
        '3506',
        '3507',
        '3512',
        '3516',
        '3526',
        '3529',
    ]
)

set_bad_annot_train_whitlelines = set(
    srange(949, 967) +
    srange(1033, 1064) +
    [
        '1695',
        '1847',
        '1892',
        '1922',
        '1923',
        '1967',
        '2086',
        '2093',
        '2079',
    ] +
    srange(2212, 2239) + 
    srange(2246, 2254) +
    srange(2259, 2268) +
    [
        '2274',
    ] +
    srange(2296, 2521) +
    srange(2512, 2516) + 
    srange(2521, 2618) + 
    srange(2536, 2672) +
    [
        '2930',
    ] +
    srange(2946, 2952) +
    srange(2995, 3006) +
    srange(3025, 3043) +
    [
        '3397',
        '3501',
        '3502',
        '3505',
        '3507',
        '3506',
        '3512',
        '3513',
        '3516',
        '3520',
        '3521',
        '3523',
        '3524',
        '3525',
        '3526',
        '3529',
        '3528',
        '3533',
    ]
)

set_bad_annot_test_whitlelines = set(
    [
        '286',
        '287',
        '288',
    ]
)

As you can see, there really is a lot of bad annotations...

Improvements: I precognise to make a new dataset, smaller ~100 images, but with higher quality annotations using labelme and taking samples of very different images.

Training

Setup

I use SparseCategoricalCrossEntropy to train the model and I add weights to each class.

Class Number Weight
Background 0 1
Whitelines 1 2
Spotlight 2 4

And so, I intend to force the model to focus first on spotlights, then on whitelines, and after on background.

Results

We get a binary cross entropy loss of 0.0014 on the training set and 0.0021 on the test set.

So, we get a pretty good confusion matrix on the test set:

Note:

It is important to note that these score are by pixel and not by object.
0.7639 didn't mean that 76% of the spotlights are rightfully categorised, but it means 76% of the pixels of the spotlights class are rightfully categorised. Giving that the annotation use ellipsis that usually is bigger that the spotlight without a real logic, it is pretty normal that we didn't get a perfect 'score'.

For exemple:

Input

We can see that the shape of the annotated spotlight on the target image have a different shape, even though they are exactly the same, and we can see that the predicted shape is near of the target, but it is that small imprecision that decrease the classification score.

stitching_diff2mask

Main idea of the Stitching Diff2Mask

Idea of Arthur: Use the diff2mask model previously use in the autoencoder experiments. Feed it with stitched images difference, and train it to detect FODs.

It is a little dangerous because as a supervised learning, it efficiency will by bound to the variety and the quality of the generated FODs.

Experience report

  1. Experience: Test Diff2Mask with a custom loss function to ignore the black area.

  2. Experience: Test what is the best regularisation layer between: Dropout, SpatialDropout, BatchNormalisation, None.

  3. Experience: Test with masking non overlaping part of the images.

  4. Experience: I compare the efficiency of that model to a model with only one input.

  5. Experience: I try the same one-input model on the RR images.

  6. Experience: I test this one-input model on real FOD, reusing the FOD dataset.

  7. Experience: Change the FOD generator to make the generated FOD sharper.

  8. Experience: I switch back to the two-inputs model, remove the siamois models for preprocessing, and directly do an absolute diffference of the two inputs.

Result Summary

The Diff2Mask model works very well in the last experiment, but because it is a supervised learning, it is very hard to predictict how it will behave for new FODs of different shape, color or texture.

But still, I think it may be worthy to include it on the pipeline and train it again on each FOD it missed.

Yet, I feel unsupervised model should be more powerfull to detect never-saw FODs, that why I focus on GANs.

experience_1

Information

Create a new model that exploit the stitched images as inputs and output a mask to detect fods (Diff2mask models).

I decide to input the two image into the model by stacking then on their last axis, instead of feed their difference to the model.

Because two images keep more informations than a difference.

Because the black area cause by orthophoto is an useless information, we try to not train our model on it, by creating a custom loss function.

Problem

It requiere a custom loss function if we want to implement a mask system to only focus on the overlapping and not on the part outside.

But because Keras is a kind of compilated language, and because the errors it throw are hardly understandable, I decide to switch to pytorch for the next experiment. Pytorch had the advantage to be far more flexible and because it is an interpreted langage, it is also easier to debug.


---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
~/Desktop/stage_louis/experiments/venv/lib/python3.6/site-packages/tensorflow/python/framework/ops.py in _create_c_op(graph, node_def, inputs, control_inputs, op_def)
   1879   try:
-> 1880     c_op = pywrap_tf_session.TF_FinishOperation(op_desc)
   1881   except errors.InvalidArgumentError as e:

InvalidArgumentError: Invalid value in tensor used for shape: -188

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-24-4297217793d0> in <module>
     14     epochs=epochs,
     15     batch_size=-1,
---> 16     callbacks=[reduce_lr_loss, earlyStopping],
     17 )
     18 

~/Desktop/stage_louis/experiments/venv/lib/python3.6/site-packages/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing)
   1146           use_multiprocessing=use_multiprocessing,
   1147           model=self,
-> 1148           steps_per_execution=self._steps_per_execution)
   1149 
   1150       # Container that configures and calls `tf.keras.Callback`s.

~/Desktop/stage_louis/experiments/venv/lib/python3.6/site-packages/keras/engine/data_adapter.py in get_data_handler(*args, **kwargs)
   1381   if getattr(kwargs["model"], "_cluster_coordinator", None):
   1382     return _ClusterCoordinatorDataHandler(*args, **kwargs)
-> 1383   return DataHandler(*args, **kwargs)
   1384 
   1385 

~/Desktop/stage_louis/experiments/venv/lib/python3.6/site-packages/keras/engine/data_adapter.py in __init__(self, x, y, sample_weight, batch_size, steps_per_epoch, initial_epoch, epochs, shuffle, class_weight, max_queue_size, workers, use_multiprocessing, model, steps_per_execution, distribute)
   1148         use_multiprocessing=use_multiprocessing,
   1149         distribution_strategy=tf.distribute.get_strategy(),
-> 1150         model=model)
   1151 
   1152     strategy = tf.distribute.get_strategy()

~/Desktop/stage_louis/experiments/venv/lib/python3.6/site-packages/keras/engine/data_adapter.py in __init__(self, x, y, sample_weights, sample_weight_modes, batch_size, epochs, steps, shuffle, **kwargs)
    318       return flat_dataset
    319 
--> 320     indices_dataset = indices_dataset.flat_map(slice_batch_indices)
    321 
    322     dataset = self.slice_inputs(indices_dataset, inputs)

~/Desktop/stage_louis/experiments/venv/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py in flat_map(self, map_func)
   1901       Dataset: A `Dataset`.
   1902     """
-> 1903     return FlatMapDataset(self, map_func)
   1904 
   1905   def interleave(self,

~/Desktop/stage_louis/experiments/venv/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py in __init__(self, input_dataset, map_func)
   5061     self._input_dataset = input_dataset
   5062     self._map_func = StructuredFunctionWrapper(
-> 5063         map_func, self._transformation_name(), dataset=input_dataset)
   5064     if not isinstance(self._map_func.output_structure, DatasetSpec):
   5065       raise TypeError(

~/Desktop/stage_louis/experiments/venv/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py in __init__(self, func, transformation_name, dataset, input_classes, input_shapes, input_types, input_structure, add_to_graph, use_legacy_function, defun_kwargs)
   4216         fn_factory = trace_tf_function(defun_kwargs)
   4217 
-> 4218     self._function = fn_factory()
   4219     # There is no graph to add in eager mode.
   4220     add_to_graph &= not context.executing_eagerly()

~/Desktop/stage_louis/experiments/venv/lib/python3.6/site-packages/tensorflow/python/eager/function.py in get_concrete_function(self, *args, **kwargs)
   3149     """
   3150     graph_function = self._get_concrete_function_garbage_collected(
-> 3151         *args, **kwargs)
   3152     graph_function._garbage_collector.release()  # pylint: disable=protected-access
   3153     return graph_function

~/Desktop/stage_louis/experiments/venv/lib/python3.6/site-packages/tensorflow/python/eager/function.py in _get_concrete_function_garbage_collected(self, *args, **kwargs)
   3114       args, kwargs = None, None
   3115     with self._lock:
-> 3116       graph_function, _ = self._maybe_define_function(args, kwargs)
   3117       seen_names = set()
   3118       captured = object_identity.ObjectIdentitySet(

~/Desktop/stage_louis/experiments/venv/lib/python3.6/site-packages/tensorflow/python/eager/function.py in _maybe_define_function(self, args, kwargs)
   3461 
   3462           self._function_cache.missed.add(call_context_key)
-> 3463           graph_function = self._create_graph_function(args, kwargs)
   3464           self._function_cache.primary[cache_key] = graph_function
   3465 

~/Desktop/stage_louis/experiments/venv/lib/python3.6/site-packages/tensorflow/python/eager/function.py in _create_graph_function(self, args, kwargs, override_flat_arg_shapes)
   3306             arg_names=arg_names,
   3307             override_flat_arg_shapes=override_flat_arg_shapes,
-> 3308             capture_by_value=self._capture_by_value),
   3309         self._function_attributes,
   3310         function_spec=self.function_spec,

~/Desktop/stage_louis/experiments/venv/lib/python3.6/site-packages/tensorflow/python/framework/func_graph.py in func_graph_from_py_func(name, python_func, args, kwargs, signature, func_graph, autograph, autograph_options, add_control_dependencies, arg_names, op_return_value, collections, capture_by_value, override_flat_arg_shapes, acd_record_initial_resource_uses)
   1005         _, original_func = tf_decorator.unwrap(python_func)
   1006 
-> 1007       func_outputs = python_func(*func_args, **func_kwargs)
   1008 
   1009       # invariant: `func_outputs` contains only Tensors, CompositeTensors,

~/Desktop/stage_louis/experiments/venv/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py in wrapped_fn(*args)
   4193           attributes=defun_kwargs)
   4194       def wrapped_fn(*args):  # pylint: disable=missing-docstring
-> 4195         ret = wrapper_helper(*args)
   4196         ret = structure.to_tensor_list(self._output_structure, ret)
   4197         return [ops.convert_to_tensor(t) for t in ret]

~/Desktop/stage_louis/experiments/venv/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py in wrapper_helper(*args)
   4123       if not _should_unpack(nested_args):
   4124         nested_args = (nested_args,)
-> 4125       ret = autograph.tf_convert(self._func, ag_ctx)(*nested_args)
   4126       if _should_pack(ret):
   4127         ret = tuple(ret)

~/Desktop/stage_louis/experiments/venv/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py in wrapper(*args, **kwargs)
    690       try:
    691         with conversion_ctx:
--> 692           return converted_call(f, args, kwargs, options=options)
    693       except Exception as e:  # pylint:disable=broad-except
    694         if hasattr(e, 'ag_error_metadata'):

~/Desktop/stage_louis/experiments/venv/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py in converted_call(f, args, kwargs, caller_fn_scope, options)
    380 
    381   if not options.user_requested and conversion.is_allowlisted(f):
--> 382     return _call_unconverted(f, args, kwargs, options)
    383 
    384   # internal_convert_user_code is for example turned off when issuing a dynamic

~/Desktop/stage_louis/experiments/venv/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py in _call_unconverted(f, args, kwargs, options, update_cache)
    461 
    462   if kwargs is not None:
--> 463     return f(*args, **kwargs)
    464   return f(*args)
    465 

~/Desktop/stage_louis/experiments/venv/lib/python3.6/site-packages/keras/engine/data_adapter.py in slice_batch_indices(indices)
    305       first_k_indices = tf.slice(indices, [0], [num_in_full_batch])
    306       first_k_indices = tf.reshape(
--> 307           first_k_indices, [num_full_batches, batch_size])
    308 
    309       flat_dataset = tf.data.Dataset.from_tensor_slices(first_k_indices)

~/Desktop/stage_louis/experiments/venv/lib/python3.6/site-packages/tensorflow/python/util/dispatch.py in wrapper(*args, **kwargs)
    204     """Call target, and fall back on dispatchers if there is a TypeError."""
    205     try:
--> 206       return target(*args, **kwargs)
    207     except (TypeError, ValueError):
    208       # Note: convert_to_eager_tensor currently raises a ValueError, not a

~/Desktop/stage_louis/experiments/venv/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py in reshape(tensor, shape, name)
    194     A `Tensor`. Has the same type as `tensor`.
    195   """
--> 196   result = gen_array_ops.reshape(tensor, shape, name)
    197   tensor_util.maybe_set_static_shape(result, shape)
    198   return result

~/Desktop/stage_louis/experiments/venv/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py in reshape(tensor, shape, name)
   8402   # Add nodes to the TensorFlow graph.
   8403   _, _, _op, _outputs = _op_def_library._apply_op_helper(
-> 8404         "Reshape", tensor=tensor, shape=shape, name=name)
   8405   _result = _outputs[:]
   8406   if _execute.must_record_gradient():

~/Desktop/stage_louis/experiments/venv/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py in _apply_op_helper(op_type_name, name, **keywords)
    748       op = g._create_op_internal(op_type_name, inputs, dtypes=None,
    749                                  name=scope, input_types=input_types,
--> 750                                  attrs=attr_protos, op_def=op_def)
    751 
    752     # `outputs` is returned as a separate return value so that the output

~/Desktop/stage_louis/experiments/venv/lib/python3.6/site-packages/tensorflow/python/framework/func_graph.py in _create_op_internal(self, op_type, inputs, dtypes, input_types, name, attrs, op_def, compute_device)
    599     return super(FuncGraph, self)._create_op_internal(  # pylint: disable=protected-access
    600         op_type, captured_inputs, dtypes, input_types, name, attrs, op_def,
--> 601         compute_device)
    602 
    603   def capture(self, tensor, name=None, shape=None):

~/Desktop/stage_louis/experiments/venv/lib/python3.6/site-packages/tensorflow/python/framework/ops.py in _create_op_internal(self, op_type, inputs, dtypes, input_types, name, attrs, op_def, compute_device)
   3567           input_types=input_types,
   3568           original_op=self._default_original_op,
-> 3569           op_def=op_def)
   3570       self._create_op_helper(ret, compute_device=compute_device)
   3571     return ret

~/Desktop/stage_louis/experiments/venv/lib/python3.6/site-packages/tensorflow/python/framework/ops.py in __init__(self, node_def, g, inputs, output_types, control_inputs, input_types, original_op, op_def)
   2040         op_def = self._graph._get_op_def(node_def.op)
   2041       self._c_op = _create_c_op(self._graph, node_def, inputs,
-> 2042                                 control_input_ops, op_def)
   2043       name = compat.as_str(node_def.name)
   2044 

~/Desktop/stage_louis/experiments/venv/lib/python3.6/site-packages/tensorflow/python/framework/ops.py in _create_c_op(graph, node_def, inputs, control_inputs, op_def)
   1881   except errors.InvalidArgumentError as e:
   1882     # Convert to ValueError for backwards compatibility.
-> 1883     raise ValueError(str(e))
   1884 
   1885   return c_op

ValueError: Invalid value in tensor used for shape: -188

Because it was really hard to debug keras I try to switch to pytorch.

Results

I test a new discriminator with 2 images as input and it works well, as it is able to differentiate between images fliped Left-Right, I conclude that it works very wells.

Next Step

It may be interesting to test shared networks to preprocess the images before making the difference. It may be able to compensate the difference of lighting condition

experience_2

Information

This time, I add a siamois models (models with shared weights) to preprocess the input images. For example, offset the lightning conditions, the weather, etc...

I drop the idea of making a custom loss function as it would be too bothersome.

Results

Input 1 Input 2 Prediction Target

Conclusion

We can see the results are pretty goods.

Next step

I will directly mask the non-overlapping parts of the two images to improve the detection and faster the training.

experience_3

Information

This time, I decide to mask the part of the image that didn't overlap.

Results

The results are trully good as you can see it here.

Samples:

Input 1 Input 2 Output Target
reduce_lr_loss = keras.callbacks.ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=15, verbose=1, min_delta=1e-3, mode='min')
earlyStopping = keras.callbacks.EarlyStopping(monitor='val_loss', patience=25, verbose=0, mode='min', restore_best_weights=True)

epochs = 100

diff2mask_model.fit(
    x = np.concatenate([
            dataset["train"]["diff_image"] / 255,
            dataset["train"]["diff_mask"] / 1.
        ],
        axis=-1
    ),
    y=dataset["train"]["fod_mask"] / 1.,
    epochs=epochs,
    batch_size=10,
    validation_split = 0.2,
    callbacks=[reduce_lr_loss, earlyStopping],
)

diff2mask_model.save("diff2mask_model"+"_"+str(epochs)+"epochs_"+timestamp())
Epoch 1/100
15/15 [==============================] - 2s 102ms/step - loss: 0.5273 - val_loss: 0.0544
Epoch 2/100
15/15 [==============================] - 1s 85ms/step - loss: 0.0708 - val_loss: 0.0306
Epoch 3/100
15/15 [==============================] - 1s 86ms/step - loss: 0.0268 - val_loss: 0.0204
Epoch 4/100
15/15 [==============================] - 1s 86ms/step - loss: 0.0217 - val_loss: 0.0198
Epoch 5/100
15/15 [==============================] - 1s 86ms/step - loss: 0.0204 - val_loss: 0.0188
Epoch 6/100
15/15 [==============================] - 1s 86ms/step - loss: 0.0192 - val_loss: 0.0176
Epoch 7/100
15/15 [==============================] - 1s 86ms/step - loss: 0.0179 - val_loss: 0.0164
Epoch 8/100
15/15 [==============================] - 1s 87ms/step - loss: 0.0164 - val_loss: 0.0152
Epoch 9/100
15/15 [==============================] - 1s 86ms/step - loss: 0.0156 - val_loss: 0.0150
Epoch 10/100
15/15 [==============================] - 1s 86ms/step - loss: 0.0154 - val_loss: 0.0147
Epoch 11/100
15/15 [==============================] - 1s 86ms/step - loss: 0.0152 - val_loss: 0.0144
Epoch 12/100
15/15 [==============================] - 1s 86ms/step - loss: 0.0150 - val_loss: 0.0143
Epoch 13/100
15/15 [==============================] - 1s 86ms/step - loss: 0.0149 - val_loss: 0.0142
Epoch 14/100
15/15 [==============================] - 1s 86ms/step - loss: 0.0148 - val_loss: 0.0140
Epoch 15/100
15/15 [==============================] - 1s 86ms/step - loss: 0.0144 - val_loss: 0.0137
Epoch 16/100
15/15 [==============================] - 1s 87ms/step - loss: 0.0138 - val_loss: 0.0126
Epoch 17/100
15/15 [==============================] - 1s 86ms/step - loss: 0.0120 - val_loss: 0.0106
Epoch 18/100
15/15 [==============================] - 1s 86ms/step - loss: 0.0090 - val_loss: 0.0112
Epoch 19/100
15/15 [==============================] - 1s 86ms/step - loss: 0.0092 - val_loss: 0.0076
Epoch 20/100
15/15 [==============================] - 1s 86ms/step - loss: 0.0070 - val_loss: 0.0054
Epoch 21/100
15/15 [==============================] - 1s 86ms/step - loss: 0.0064 - val_loss: 0.0053
Epoch 22/100
15/15 [==============================] - 1s 87ms/step - loss: 0.0052 - val_loss: 0.0047
Epoch 23/100
15/15 [==============================] - 1s 86ms/step - loss: 0.0050 - val_loss: 0.0044
Epoch 24/100
15/15 [==============================] - 1s 86ms/step - loss: 0.0050 - val_loss: 0.0041
Epoch 25/100
15/15 [==============================] - 1s 86ms/step - loss: 0.0045 - val_loss: 0.0039
Epoch 26/100
15/15 [==============================] - 1s 87ms/step - loss: 0.0049 - val_loss: 0.0040
Epoch 27/100
15/15 [==============================] - 1s 87ms/step - loss: 0.0043 - val_loss: 0.0040
Epoch 28/100
15/15 [==============================] - 1s 87ms/step - loss: 0.0040 - val_loss: 0.0037
Epoch 29/100
15/15 [==============================] - 1s 87ms/step - loss: 0.0037 - val_loss: 0.0034
Epoch 30/100
15/15 [==============================] - 1s 87ms/step - loss: 0.0037 - val_loss: 0.0032
Epoch 31/100
15/15 [==============================] - 1s 86ms/step - loss: 0.0035 - val_loss: 0.0031
Epoch 32/100
15/15 [==============================] - 1s 86ms/step - loss: 0.0039 - val_loss: 0.0026
Epoch 33/100
15/15 [==============================] - 1s 86ms/step - loss: 0.0030 - val_loss: 0.0027
Epoch 34/100
15/15 [==============================] - 1s 86ms/step - loss: 0.0033 - val_loss: 0.0024
Epoch 35/100
15/15 [==============================] - 1s 86ms/step - loss: 0.0027 - val_loss: 0.0025
Epoch 36/100
15/15 [==============================] - 1s 87ms/step - loss: 0.0028 - val_loss: 0.0021
Epoch 37/100
15/15 [==============================] - 1s 87ms/step - loss: 0.0026 - val_loss: 0.0023
Epoch 38/100
15/15 [==============================] - 1s 87ms/step - loss: 0.0028 - val_loss: 0.0019
Epoch 39/100
15/15 [==============================] - 1s 87ms/step - loss: 0.0024 - val_loss: 0.0021
Epoch 40/100
15/15 [==============================] - 1s 87ms/step - loss: 0.0032 - val_loss: 0.0021
Epoch 41/100
15/15 [==============================] - 1s 86ms/step - loss: 0.0024 - val_loss: 0.0020
Epoch 42/100
15/15 [==============================] - 3s 192ms/step - loss: 0.0026 - val_loss: 0.0020
Epoch 43/100
15/15 [==============================] - 1s 78ms/step - loss: 0.0028 - val_loss: 0.0021
Epoch 44/100
15/15 [==============================] - 1s 78ms/step - loss: 0.0020 - val_loss: 0.0022
Epoch 45/100
15/15 [==============================] - 1s 78ms/step - loss: 0.0022 - val_loss: 0.0018
Epoch 46/100
15/15 [==============================] - 1s 78ms/step - loss: 0.0025 - val_loss: 0.0019
Epoch 47/100
15/15 [==============================] - 1s 79ms/step - loss: 0.0023 - val_loss: 0.0015
Epoch 48/100
15/15 [==============================] - 1s 79ms/step - loss: 0.0022 - val_loss: 0.0018
Epoch 49/100
15/15 [==============================] - 1s 79ms/step - loss: 0.0019 - val_loss: 0.0014
Epoch 50/100
15/15 [==============================] - 1s 78ms/step - loss: 0.0022 - val_loss: 0.0017
Epoch 51/100
15/15 [==============================] - 1s 78ms/step - loss: 0.0022 - val_loss: 0.0014
Epoch 52/100
15/15 [==============================] - 1s 78ms/step - loss: 0.0022 - val_loss: 0.0022
Epoch 53/100
15/15 [==============================] - 1s 78ms/step - loss: 0.0021 - val_loss: 0.0016
Epoch 54/100
15/15 [==============================] - 1s 79ms/step - loss: 0.0018 - val_loss: 0.0016
Epoch 55/100
15/15 [==============================] - 1s 79ms/step - loss: 0.0020 - val_loss: 0.0013
Epoch 56/100
15/15 [==============================] - 1s 79ms/step - loss: 0.0023 - val_loss: 0.0015
Epoch 57/100
15/15 [==============================] - 1s 78ms/step - loss: 0.0021 - val_loss: 0.0014
Epoch 58/100
15/15 [==============================] - 1s 78ms/step - loss: 0.0023 - val_loss: 0.0015
Epoch 59/100
15/15 [==============================] - 1s 79ms/step - loss: 0.0017 - val_loss: 0.0016
Epoch 60/100
15/15 [==============================] - 1s 79ms/step - loss: 0.0019 - val_loss: 0.0016
Epoch 61/100
15/15 [==============================] - 1s 78ms/step - loss: 0.0022 - val_loss: 0.0013
Epoch 62/100
15/15 [==============================] - 1s 79ms/step - loss: 0.0021 - val_loss: 0.0014

Epoch 00062: ReduceLROnPlateau reducing learning rate to 0.00010000000474974513.
Epoch 63/100
15/15 [==============================] - 1s 79ms/step - loss: 0.0017 - val_loss: 0.0014
Epoch 64/100
15/15 [==============================] - 1s 78ms/step - loss: 0.0022 - val_loss: 0.0013
Epoch 65/100
15/15 [==============================] - 1s 79ms/step - loss: 0.0018 - val_loss: 0.0013
Epoch 66/100
15/15 [==============================] - 1s 79ms/step - loss: 0.0016 - val_loss: 0.0012
Epoch 67/100
15/15 [==============================] - 1s 79ms/step - loss: 0.0020 - val_loss: 0.0012
Epoch 68/100
15/15 [==============================] - 1s 79ms/step - loss: 0.0014 - val_loss: 0.0012
Epoch 69/100
15/15 [==============================] - 1s 78ms/step - loss: 0.0016 - val_loss: 0.0012
Epoch 70/100
15/15 [==============================] - 1s 79ms/step - loss: 0.0019 - val_loss: 0.0012
Epoch 71/100
15/15 [==============================] - 1s 79ms/step - loss: 0.0017 - val_loss: 0.0012
Epoch 72/100
15/15 [==============================] - 1s 78ms/step - loss: 0.0023 - val_loss: 0.0012
Epoch 73/100
15/15 [==============================] - 1s 79ms/step - loss: 0.0018 - val_loss: 0.0012
Epoch 74/100
15/15 [==============================] - 1s 79ms/step - loss: 0.0020 - val_loss: 0.0012
Epoch 75/100
15/15 [==============================] - 1s 78ms/step - loss: 0.0017 - val_loss: 0.0012
Epoch 76/100
15/15 [==============================] - 1s 79ms/step - loss: 0.0020 - val_loss: 0.0012
Epoch 77/100
15/15 [==============================] - 1s 79ms/step - loss: 0.0014 - val_loss: 0.0012

Epoch 00077: ReduceLROnPlateau reducing learning rate to 1.0000000474974514e-05.
Epoch 78/100
15/15 [==============================] - 1s 79ms/step - loss: 0.0016 - val_loss: 0.0012
Epoch 79/100
15/15 [==============================] - 1s 79ms/step - loss: 0.0016 - val_loss: 0.0012
Epoch 80/100
15/15 [==============================] - 1s 79ms/step - loss: 0.0015 - val_loss: 0.0012
Epoch 81/100
15/15 [==============================] - 1s 78ms/step - loss: 0.0018 - val_loss: 0.0012
Epoch 82/100
15/15 [==============================] - 1s 79ms/step - loss: 0.0015 - val_loss: 0.0012
Epoch 83/100
15/15 [==============================] - 1s 79ms/step - loss: 0.0016 - val_loss: 0.0012
Epoch 84/100
15/15 [==============================] - 1s 79ms/step - loss: 0.0014 - val_loss: 0.0012
Epoch 85/100
15/15 [==============================] - 1s 79ms/step - loss: 0.0017 - val_loss: 0.0012
Epoch 86/100
15/15 [==============================] - 1s 79ms/step - loss: 0.0017 - val_loss: 0.0012
Epoch 87/100
15/15 [==============================] - 1s 78ms/step - loss: 0.0020 - val_loss: 0.0012
Epoch 88/100
15/15 [==============================] - 1s 78ms/step - loss: 0.0018 - val_loss: 0.0012
Epoch 89/100
15/15 [==============================] - 1s 78ms/step - loss: 0.0021 - val_loss: 0.0012
Epoch 90/100
15/15 [==============================] - 1s 78ms/step - loss: 0.0015 - val_loss: 0.0012
Epoch 91/100
15/15 [==============================] - 1s 79ms/step - loss: 0.0015 - val_loss: 0.0012
Epoch 92/100
15/15 [==============================] - 1s 78ms/step - loss: 0.0021 - val_loss: 0.0012

Epoch 00092: ReduceLROnPlateau reducing learning rate to 1.0000000656873453e-06.
Epoch 93/100
15/15 [==============================] - 1s 78ms/step - loss: 0.0023 - val_loss: 0.0012
Epoch 94/100
15/15 [==============================] - 1s 79ms/step - loss: 0.0018 - val_loss: 0.0012
Epoch 95/100
15/15 [==============================] - 1s 79ms/step - loss: 0.0017 - val_loss: 0.0012
Epoch 96/100
15/15 [==============================] - 1s 79ms/step - loss: 0.0014 - val_loss: 0.0012
Epoch 97/100
15/15 [==============================] - 1s 78ms/step - loss: 0.0019 - val_loss: 0.0012
Epoch 98/100
15/15 [==============================] - 1s 79ms/step - loss: 0.0017 - val_loss: 0.0012
Epoch 99/100
15/15 [==============================] - 1s 78ms/step - loss: 0.0017 - val_loss: 0.0012
Epoch 100/100
15/15 [==============================] - 1s 77ms/step - loss: 0.0017 - val_loss: 0.0012

Conclusion

It seeems converging faster by masking the parts outside of the intersection.

So, it is trully better to mask the non-overlapping part of the imputs image.

Next Step

But to decide if using two inputs is trully need, I will try the same model with only one input.

experience_4

Information

Now, I tried to see the same model in the exact same conditions, but with only one input, can to the as good as the previous one.

Results

The results are trully good as you can see it here.

Samples:

Input Output Target
reduce_lr_loss = keras.callbacks.ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=15, verbose=1, min_delta=1e-3, mode='min')
earlyStopping = keras.callbacks.EarlyStopping(monitor='val_loss', patience=25, verbose=0, mode='min', restore_best_weights=True)

epochs = 100

diff2mask_model.fit(
    x = np.concatenate([
            dataset["train"]["diff_image"] / 255,
            dataset["train"]["diff_mask"] / 1.
        ],
        axis=-1
    ),
    y=dataset["train"]["fod_mask"] / 1.,
    epochs=epochs,
    batch_size=10,
    validation_split = 0.2
    #callbacks=[reduce_lr_loss, earlyStopping],
)

diff2mask_model.save("diff2mask_model"+"_"+str(epochs)+"epochs_"+timestamp())
Epoch 1/100
15/15 [==============================] - 2s 86ms/step - loss: 0.6452 - val_loss: 0.4755
Epoch 2/100
15/15 [==============================] - 1s 70ms/step - loss: 0.1265 - val_loss: 0.0267
Epoch 3/100
15/15 [==============================] - 1s 71ms/step - loss: 0.0238 - val_loss: 0.0173
Epoch 4/100
15/15 [==============================] - 1s 71ms/step - loss: 0.0174 - val_loss: 0.0152
Epoch 5/100
15/15 [==============================] - 1s 71ms/step - loss: 0.0161 - val_loss: 0.0142
Epoch 6/100
15/15 [==============================] - 1s 72ms/step - loss: 0.0145 - val_loss: 0.0127
Epoch 7/100
15/15 [==============================] - 1s 71ms/step - loss: 0.0134 - val_loss: 0.0123
Epoch 8/100
15/15 [==============================] - 1s 72ms/step - loss: 0.0134 - val_loss: 0.0122
Epoch 9/100
15/15 [==============================] - 1s 71ms/step - loss: 0.0135 - val_loss: 0.0125
Epoch 10/100
15/15 [==============================] - 1s 71ms/step - loss: 0.0134 - val_loss: 0.0126
Epoch 11/100
15/15 [==============================] - 1s 71ms/step - loss: 0.0133 - val_loss: 0.0122
Epoch 12/100
15/15 [==============================] - 1s 70ms/step - loss: 0.0133 - val_loss: 0.0125
Epoch 13/100
15/15 [==============================] - 1s 70ms/step - loss: 0.0133 - val_loss: 0.0121
Epoch 14/100
15/15 [==============================] - 1s 70ms/step - loss: 0.0131 - val_loss: 0.0121
Epoch 15/100
15/15 [==============================] - 1s 72ms/step - loss: 0.0131 - val_loss: 0.0121
Epoch 16/100
15/15 [==============================] - 1s 71ms/step - loss: 0.0131 - val_loss: 0.0119
Epoch 17/100
15/15 [==============================] - 1s 71ms/step - loss: 0.0128 - val_loss: 0.0110
Epoch 18/100
15/15 [==============================] - 1s 74ms/step - loss: 0.0123 - val_loss: 0.0110
Epoch 19/100
15/15 [==============================] - 1s 73ms/step - loss: 0.0117 - val_loss: 0.0108
Epoch 20/100
15/15 [==============================] - 1s 71ms/step - loss: 0.0112 - val_loss: 0.0088
Epoch 21/100
15/15 [==============================] - 1s 71ms/step - loss: 0.0106 - val_loss: 0.0086
Epoch 22/100
15/15 [==============================] - 1s 71ms/step - loss: 0.0098 - val_loss: 0.0068
Epoch 23/100
15/15 [==============================] - 1s 71ms/step - loss: 0.0116 - val_loss: 0.0091
Epoch 24/100
15/15 [==============================] - 1s 71ms/step - loss: 0.0103 - val_loss: 0.0087
Epoch 25/100
15/15 [==============================] - 1s 70ms/step - loss: 0.0097 - val_loss: 0.0080
Epoch 26/100
15/15 [==============================] - 1s 67ms/step - loss: 0.0091 - val_loss: 0.0078
Epoch 27/100
15/15 [==============================] - 1s 71ms/step - loss: 0.0085 - val_loss: 0.0067
Epoch 28/100
15/15 [==============================] - 1s 73ms/step - loss: 0.0078 - val_loss: 0.0060
Epoch 29/100
15/15 [==============================] - 1s 68ms/step - loss: 0.0068 - val_loss: 0.0049
Epoch 30/100
15/15 [==============================] - 1s 68ms/step - loss: 0.0058 - val_loss: 0.0041
Epoch 31/100
15/15 [==============================] - 1s 69ms/step - loss: 0.0051 - val_loss: 0.0040
Epoch 32/100
15/15 [==============================] - 1s 79ms/step - loss: 0.0066 - val_loss: 0.0044
Epoch 33/100
15/15 [==============================] - 1s 71ms/step - loss: 0.0063 - val_loss: 0.0050
Epoch 34/100
15/15 [==============================] - 1s 69ms/step - loss: 0.0053 - val_loss: 0.0041
Epoch 35/100
15/15 [==============================] - 1s 74ms/step - loss: 0.0046 - val_loss: 0.0035
Epoch 36/100
15/15 [==============================] - 1s 66ms/step - loss: 0.0041 - val_loss: 0.0037
Epoch 37/100
15/15 [==============================] - 1s 75ms/step - loss: 0.0043 - val_loss: 0.0033
Epoch 38/100
15/15 [==============================] - 1s 66ms/step - loss: 0.0042 - val_loss: 0.0032
Epoch 39/100
15/15 [==============================] - 1s 70ms/step - loss: 0.0039 - val_loss: 0.0030
Epoch 40/100
15/15 [==============================] - 1s 72ms/step - loss: 0.0032 - val_loss: 0.0028
Epoch 41/100
15/15 [==============================] - 1s 70ms/step - loss: 0.0034 - val_loss: 0.0028
Epoch 42/100
15/15 [==============================] - 1s 72ms/step - loss: 0.0034 - val_loss: 0.0027
Epoch 43/100
15/15 [==============================] - 1s 67ms/step - loss: 0.0030 - val_loss: 0.0026
Epoch 44/100
15/15 [==============================] - 1s 74ms/step - loss: 0.0031 - val_loss: 0.0029
Epoch 45/100
15/15 [==============================] - 1s 75ms/step - loss: 0.0036 - val_loss: 0.0025
Epoch 46/100
15/15 [==============================] - 1s 74ms/step - loss: 0.0028 - val_loss: 0.0023
Epoch 47/100
15/15 [==============================] - 1s 68ms/step - loss: 0.0027 - val_loss: 0.0022
Epoch 48/100
15/15 [==============================] - 1s 71ms/step - loss: 0.0026 - val_loss: 0.0021
Epoch 49/100
15/15 [==============================] - 1s 72ms/step - loss: 0.0025 - val_loss: 0.0021
Epoch 50/100
15/15 [==============================] - 1s 74ms/step - loss: 0.0025 - val_loss: 0.0020
Epoch 51/100
15/15 [==============================] - 1s 69ms/step - loss: 0.0024 - val_loss: 0.0019
Epoch 52/100
15/15 [==============================] - 1s 67ms/step - loss: 0.0024 - val_loss: 0.0019
Epoch 53/100
15/15 [==============================] - 1s 70ms/step - loss: 0.0021 - val_loss: 0.0018
Epoch 54/100
15/15 [==============================] - 1s 73ms/step - loss: 0.0021 - val_loss: 0.0018
Epoch 55/100
15/15 [==============================] - 1s 72ms/step - loss: 0.0024 - val_loss: 0.0019
Epoch 56/100
15/15 [==============================] - 1s 72ms/step - loss: 0.0023 - val_loss: 0.0017
Epoch 57/100
15/15 [==============================] - 1s 72ms/step - loss: 0.0020 - val_loss: 0.0016
Epoch 58/100
15/15 [==============================] - 1s 73ms/step - loss: 0.0022 - val_loss: 0.0016
Epoch 59/100
15/15 [==============================] - 1s 72ms/step - loss: 0.0018 - val_loss: 0.0016
Epoch 60/100
15/15 [==============================] - 1s 71ms/step - loss: 0.0019 - val_loss: 0.0017
Epoch 61/100
15/15 [==============================] - 1s 73ms/step - loss: 0.0019 - val_loss: 0.0017
Epoch 62/100
15/15 [==============================] - 1s 72ms/step - loss: 0.0019 - val_loss: 0.0015
Epoch 63/100
15/15 [==============================] - 1s 72ms/step - loss: 0.0020 - val_loss: 0.0017
Epoch 64/100
15/15 [==============================] - 1s 71ms/step - loss: 0.0021 - val_loss: 0.0015
Epoch 65/100
15/15 [==============================] - 1s 72ms/step - loss: 0.0018 - val_loss: 0.0017
Epoch 66/100
15/15 [==============================] - 1s 72ms/step - loss: 0.0030 - val_loss: 0.0019
Epoch 67/100
15/15 [==============================] - 1s 72ms/step - loss: 0.0021 - val_loss: 0.0017
Epoch 68/100
15/15 [==============================] - 1s 72ms/step - loss: 0.0020 - val_loss: 0.0015
Epoch 69/100
15/15 [==============================] - 1s 71ms/step - loss: 0.0019 - val_loss: 0.0015
Epoch 70/100
15/15 [==============================] - 1s 72ms/step - loss: 0.0019 - val_loss: 0.0015
Epoch 71/100
15/15 [==============================] - 1s 72ms/step - loss: 0.0017 - val_loss: 0.0015
Epoch 72/100
15/15 [==============================] - 1s 72ms/step - loss: 0.0018 - val_loss: 0.0014
Epoch 73/100
15/15 [==============================] - 1s 72ms/step - loss: 0.0017 - val_loss: 0.0014
Epoch 74/100
15/15 [==============================] - 1s 73ms/step - loss: 0.0017 - val_loss: 0.0014
Epoch 75/100
15/15 [==============================] - 1s 72ms/step - loss: 0.0018 - val_loss: 0.0015
Epoch 76/100
15/15 [==============================] - 1s 73ms/step - loss: 0.0018 - val_loss: 0.0014
Epoch 77/100
15/15 [==============================] - 1s 72ms/step - loss: 0.0018 - val_loss: 0.0015
Epoch 78/100
15/15 [==============================] - 1s 72ms/step - loss: 0.0018 - val_loss: 0.0014
Epoch 79/100
15/15 [==============================] - 1s 72ms/step - loss: 0.0020 - val_loss: 0.0014
Epoch 80/100
15/15 [==============================] - 1s 72ms/step - loss: 0.0016 - val_loss: 0.0013
Epoch 81/100
15/15 [==============================] - 1s 73ms/step - loss: 0.0017 - val_loss: 0.0016
Epoch 82/100
15/15 [==============================] - 1s 73ms/step - loss: 0.0024 - val_loss: 0.0016
Epoch 83/100
15/15 [==============================] - 1s 72ms/step - loss: 0.0018 - val_loss: 0.0015
Epoch 84/100
15/15 [==============================] - 1s 71ms/step - loss: 0.0017 - val_loss: 0.0013
Epoch 85/100
15/15 [==============================] - 1s 72ms/step - loss: 0.0018 - val_loss: 0.0013
Epoch 86/100
15/15 [==============================] - 1s 73ms/step - loss: 0.0018 - val_loss: 0.0013
Epoch 87/100
15/15 [==============================] - 1s 74ms/step - loss: 0.0017 - val_loss: 0.0013
Epoch 88/100
15/15 [==============================] - 1s 74ms/step - loss: 0.0018 - val_loss: 0.0012
Epoch 89/100
15/15 [==============================] - 1s 80ms/step - loss: 0.0015 - val_loss: 0.0013
Epoch 90/100
15/15 [==============================] - 1s 77ms/step - loss: 0.0016 - val_loss: 0.0013
Epoch 91/100
15/15 [==============================] - 1s 72ms/step - loss: 0.0018 - val_loss: 0.0012
Epoch 92/100
15/15 [==============================] - 1s 71ms/step - loss: 0.0017 - val_loss: 0.0012
Epoch 93/100
15/15 [==============================] - 1s 76ms/step - loss: 0.0015 - val_loss: 0.0013
Epoch 94/100
15/15 [==============================] - 1s 75ms/step - loss: 0.0014 - val_loss: 0.0013
Epoch 95/100
15/15 [==============================] - 1s 72ms/step - loss: 0.0014 - val_loss: 0.0012
Epoch 96/100
15/15 [==============================] - 1s 72ms/step - loss: 0.0014 - val_loss: 0.0012
Epoch 97/100
15/15 [==============================] - 1s 72ms/step - loss: 0.0015 - val_loss: 0.0013
Epoch 98/100
15/15 [==============================] - 1s 73ms/step - loss: 0.0015 - val_loss: 0.0013
Epoch 99/100
15/15 [==============================] - 1s 76ms/step - loss: 0.0015 - val_loss: 0.0012
Epoch 100/100
15/15 [==============================] - 1s 71ms/step - loss: 0.0015 - val_loss: 0.0013

Conclusion

We can see that that using only one image as input didn't trully impact on the accuraccy of the model ; the best loss value for previous model and this one is 0.0012

This model may just be a little slower to converge, but not that much.

Next step

I will try to use RR images to see if it works well on them too.

experience_5

Information

I tried to use the RR images with a model with only one input

Results

The results are trully good as you can see it here.

Samples:

Input Output Target
reduce_lr_loss = keras.callbacks.ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=15, verbose=1, min_delta=1e-3, mode='min')
earlyStopping = keras.callbacks.EarlyStopping(monitor='val_loss', patience=25, verbose=0, mode='min', restore_best_weights=True)

epochs = 100

diff2mask_model.fit(
    x = np.concatenate([
            dataset["train"]["diff_image"] / 255,
            dataset["train"]["diff_mask"] / 1.
        ],
        axis=-1
    ),
    y=dataset["train"]["fod_mask"] / 1.,
    epochs=epochs,
    batch_size=10,
    validation_split = 0.2
    #callbacks=[reduce_lr_loss, earlyStopping],
)

diff2mask_model.save("diff2mask_model"+"_"+str(epochs)+"epochs_"+timestamp())
Epoch 1/100
15/15 [==============================] - 2s 86ms/step - loss: 0.6452 - val_loss: 0.4755
Epoch 2/100
15/15 [==============================] - 1s 70ms/step - loss: 0.1265 - val_loss: 0.0267
Epoch 3/100
15/15 [==============================] - 1s 71ms/step - loss: 0.0238 - val_loss: 0.0173
Epoch 4/100
15/15 [==============================] - 1s 71ms/step - loss: 0.0174 - val_loss: 0.0152
Epoch 5/100
15/15 [==============================] - 1s 71ms/step - loss: 0.0161 - val_loss: 0.0142
Epoch 6/100
15/15 [==============================] - 1s 72ms/step - loss: 0.0145 - val_loss: 0.0127
Epoch 7/100
15/15 [==============================] - 1s 71ms/step - loss: 0.0134 - val_loss: 0.0123
Epoch 8/100
15/15 [==============================] - 1s 72ms/step - loss: 0.0134 - val_loss: 0.0122
Epoch 9/100
15/15 [==============================] - 1s 71ms/step - loss: 0.0135 - val_loss: 0.0125
Epoch 10/100
15/15 [==============================] - 1s 71ms/step - loss: 0.0134 - val_loss: 0.0126
Epoch 11/100
15/15 [==============================] - 1s 71ms/step - loss: 0.0133 - val_loss: 0.0122
Epoch 12/100
15/15 [==============================] - 1s 70ms/step - loss: 0.0133 - val_loss: 0.0125
Epoch 13/100
15/15 [==============================] - 1s 70ms/step - loss: 0.0133 - val_loss: 0.0121
Epoch 14/100
15/15 [==============================] - 1s 70ms/step - loss: 0.0131 - val_loss: 0.0121
Epoch 15/100
15/15 [==============================] - 1s 72ms/step - loss: 0.0131 - val_loss: 0.0121
Epoch 16/100
15/15 [==============================] - 1s 71ms/step - loss: 0.0131 - val_loss: 0.0119
Epoch 17/100
15/15 [==============================] - 1s 71ms/step - loss: 0.0128 - val_loss: 0.0110
Epoch 18/100
15/15 [==============================] - 1s 74ms/step - loss: 0.0123 - val_loss: 0.0110
Epoch 19/100
15/15 [==============================] - 1s 73ms/step - loss: 0.0117 - val_loss: 0.0108
Epoch 20/100
15/15 [==============================] - 1s 71ms/step - loss: 0.0112 - val_loss: 0.0088
Epoch 21/100
15/15 [==============================] - 1s 71ms/step - loss: 0.0106 - val_loss: 0.0086
Epoch 22/100
15/15 [==============================] - 1s 71ms/step - loss: 0.0098 - val_loss: 0.0068
Epoch 23/100
15/15 [==============================] - 1s 71ms/step - loss: 0.0116 - val_loss: 0.0091
Epoch 24/100
15/15 [==============================] - 1s 71ms/step - loss: 0.0103 - val_loss: 0.0087
Epoch 25/100
15/15 [==============================] - 1s 70ms/step - loss: 0.0097 - val_loss: 0.0080
Epoch 26/100
15/15 [==============================] - 1s 67ms/step - loss: 0.0091 - val_loss: 0.0078
Epoch 27/100
15/15 [==============================] - 1s 71ms/step - loss: 0.0085 - val_loss: 0.0067
Epoch 28/100
15/15 [==============================] - 1s 73ms/step - loss: 0.0078 - val_loss: 0.0060
Epoch 29/100
15/15 [==============================] - 1s 68ms/step - loss: 0.0068 - val_loss: 0.0049
Epoch 30/100
15/15 [==============================] - 1s 68ms/step - loss: 0.0058 - val_loss: 0.0041
Epoch 31/100
15/15 [==============================] - 1s 69ms/step - loss: 0.0051 - val_loss: 0.0040
Epoch 32/100
15/15 [==============================] - 1s 79ms/step - loss: 0.0066 - val_loss: 0.0044
Epoch 33/100
15/15 [==============================] - 1s 71ms/step - loss: 0.0063 - val_loss: 0.0050
Epoch 34/100
15/15 [==============================] - 1s 69ms/step - loss: 0.0053 - val_loss: 0.0041
Epoch 35/100
15/15 [==============================] - 1s 74ms/step - loss: 0.0046 - val_loss: 0.0035
Epoch 36/100
15/15 [==============================] - 1s 66ms/step - loss: 0.0041 - val_loss: 0.0037
Epoch 37/100
15/15 [==============================] - 1s 75ms/step - loss: 0.0043 - val_loss: 0.0033
Epoch 38/100
15/15 [==============================] - 1s 66ms/step - loss: 0.0042 - val_loss: 0.0032
Epoch 39/100
15/15 [==============================] - 1s 70ms/step - loss: 0.0039 - val_loss: 0.0030
Epoch 40/100
15/15 [==============================] - 1s 72ms/step - loss: 0.0032 - val_loss: 0.0028
Epoch 41/100
15/15 [==============================] - 1s 70ms/step - loss: 0.0034 - val_loss: 0.0028
Epoch 42/100
15/15 [==============================] - 1s 72ms/step - loss: 0.0034 - val_loss: 0.0027
Epoch 43/100
15/15 [==============================] - 1s 67ms/step - loss: 0.0030 - val_loss: 0.0026
Epoch 44/100
15/15 [==============================] - 1s 74ms/step - loss: 0.0031 - val_loss: 0.0029
Epoch 45/100
15/15 [==============================] - 1s 75ms/step - loss: 0.0036 - val_loss: 0.0025
Epoch 46/100
15/15 [==============================] - 1s 74ms/step - loss: 0.0028 - val_loss: 0.0023
Epoch 47/100
15/15 [==============================] - 1s 68ms/step - loss: 0.0027 - val_loss: 0.0022
Epoch 48/100
15/15 [==============================] - 1s 71ms/step - loss: 0.0026 - val_loss: 0.0021
Epoch 49/100
15/15 [==============================] - 1s 72ms/step - loss: 0.0025 - val_loss: 0.0021
Epoch 50/100
15/15 [==============================] - 1s 74ms/step - loss: 0.0025 - val_loss: 0.0020
Epoch 51/100
15/15 [==============================] - 1s 69ms/step - loss: 0.0024 - val_loss: 0.0019
Epoch 52/100
15/15 [==============================] - 1s 67ms/step - loss: 0.0024 - val_loss: 0.0019
Epoch 53/100
15/15 [==============================] - 1s 70ms/step - loss: 0.0021 - val_loss: 0.0018
Epoch 54/100
15/15 [==============================] - 1s 73ms/step - loss: 0.0021 - val_loss: 0.0018
Epoch 55/100
15/15 [==============================] - 1s 72ms/step - loss: 0.0024 - val_loss: 0.0019
Epoch 56/100
15/15 [==============================] - 1s 72ms/step - loss: 0.0023 - val_loss: 0.0017
Epoch 57/100
15/15 [==============================] - 1s 72ms/step - loss: 0.0020 - val_loss: 0.0016
Epoch 58/100
15/15 [==============================] - 1s 73ms/step - loss: 0.0022 - val_loss: 0.0016
Epoch 59/100
15/15 [==============================] - 1s 72ms/step - loss: 0.0018 - val_loss: 0.0016
Epoch 60/100
15/15 [==============================] - 1s 71ms/step - loss: 0.0019 - val_loss: 0.0017
Epoch 61/100
15/15 [==============================] - 1s 73ms/step - loss: 0.0019 - val_loss: 0.0017
Epoch 62/100
15/15 [==============================] - 1s 72ms/step - loss: 0.0019 - val_loss: 0.0015
Epoch 63/100
15/15 [==============================] - 1s 72ms/step - loss: 0.0020 - val_loss: 0.0017
Epoch 64/100
15/15 [==============================] - 1s 71ms/step - loss: 0.0021 - val_loss: 0.0015
Epoch 65/100
15/15 [==============================] - 1s 72ms/step - loss: 0.0018 - val_loss: 0.0017
Epoch 66/100
15/15 [==============================] - 1s 72ms/step - loss: 0.0030 - val_loss: 0.0019
Epoch 67/100
15/15 [==============================] - 1s 72ms/step - loss: 0.0021 - val_loss: 0.0017
Epoch 68/100
15/15 [==============================] - 1s 72ms/step - loss: 0.0020 - val_loss: 0.0015
Epoch 69/100
15/15 [==============================] - 1s 71ms/step - loss: 0.0019 - val_loss: 0.0015
Epoch 70/100
15/15 [==============================] - 1s 72ms/step - loss: 0.0019 - val_loss: 0.0015
Epoch 71/100
15/15 [==============================] - 1s 72ms/step - loss: 0.0017 - val_loss: 0.0015
Epoch 72/100
15/15 [==============================] - 1s 72ms/step - loss: 0.0018 - val_loss: 0.0014
Epoch 73/100
15/15 [==============================] - 1s 72ms/step - loss: 0.0017 - val_loss: 0.0014
Epoch 74/100
15/15 [==============================] - 1s 73ms/step - loss: 0.0017 - val_loss: 0.0014
Epoch 75/100
15/15 [==============================] - 1s 72ms/step - loss: 0.0018 - val_loss: 0.0015
Epoch 76/100
15/15 [==============================] - 1s 73ms/step - loss: 0.0018 - val_loss: 0.0014
Epoch 77/100
15/15 [==============================] - 1s 72ms/step - loss: 0.0018 - val_loss: 0.0015
Epoch 78/100
15/15 [==============================] - 1s 72ms/step - loss: 0.0018 - val_loss: 0.0014
Epoch 79/100
15/15 [==============================] - 1s 72ms/step - loss: 0.0020 - val_loss: 0.0014
Epoch 80/100
15/15 [==============================] - 1s 72ms/step - loss: 0.0016 - val_loss: 0.0013
Epoch 81/100
15/15 [==============================] - 1s 73ms/step - loss: 0.0017 - val_loss: 0.0016
Epoch 82/100
15/15 [==============================] - 1s 73ms/step - loss: 0.0024 - val_loss: 0.0016
Epoch 83/100
15/15 [==============================] - 1s 72ms/step - loss: 0.0018 - val_loss: 0.0015
Epoch 84/100
15/15 [==============================] - 1s 71ms/step - loss: 0.0017 - val_loss: 0.0013
Epoch 85/100
15/15 [==============================] - 1s 72ms/step - loss: 0.0018 - val_loss: 0.0013
Epoch 86/100
15/15 [==============================] - 1s 73ms/step - loss: 0.0018 - val_loss: 0.0013
Epoch 87/100
15/15 [==============================] - 1s 74ms/step - loss: 0.0017 - val_loss: 0.0013
Epoch 88/100
15/15 [==============================] - 1s 74ms/step - loss: 0.0018 - val_loss: 0.0012
Epoch 89/100
15/15 [==============================] - 1s 80ms/step - loss: 0.0015 - val_loss: 0.0013
Epoch 90/100
15/15 [==============================] - 1s 77ms/step - loss: 0.0016 - val_loss: 0.0013
Epoch 91/100
15/15 [==============================] - 1s 72ms/step - loss: 0.0018 - val_loss: 0.0012
Epoch 92/100
15/15 [==============================] - 1s 71ms/step - loss: 0.0017 - val_loss: 0.0012
Epoch 93/100
15/15 [==============================] - 1s 76ms/step - loss: 0.0015 - val_loss: 0.0013
Epoch 94/100
15/15 [==============================] - 1s 75ms/step - loss: 0.0014 - val_loss: 0.0013
Epoch 95/100
15/15 [==============================] - 1s 72ms/step - loss: 0.0014 - val_loss: 0.0012
Epoch 96/100
15/15 [==============================] - 1s 72ms/step - loss: 0.0014 - val_loss: 0.0012
Epoch 97/100
15/15 [==============================] - 1s 72ms/step - loss: 0.0015 - val_loss: 0.0013
Epoch 98/100
15/15 [==============================] - 1s 73ms/step - loss: 0.0015 - val_loss: 0.0013
Epoch 99/100
15/15 [==============================] - 1s 76ms/step - loss: 0.0015 - val_loss: 0.0012
Epoch 100/100
15/15 [==============================] - 1s 71ms/step - loss: 0.0015 - val_loss: 0.0013

Conclusion

We can see that that using only one image as input didn't trully impact on the accuraccy of the model ; the best loss value for previous model and this one is 0.0012

This model may just be a little slower to converge, but not that much.

Next step

The next step will be to test if it works well to detect real FOD.

experience_6

Information

Test if the one-input-diff2mask model works on true FOD images from the FOD dataset.

I reduce the size of the generated FOD, to make them less obvious.

Results

The results are trully poor on the validation dataset here.

Samples:

Input Output Target

As you can see, the model didn't find successfully the true FOD of the validation dataset.

Conclusion

As we can see the one-input-diff2mask-UNET model can't detect the real FOD, so it may be better to fed that model with diff of stitching, or to resume the autoencoder experiements that use unsupervised learning.

Next step

We will try to improve the FOD generator and blender, to produce sharper FODs.

experience_7

Information

Change the methode that generate artificial FODs and blend them with the images.

The new generator will produce sharper FOD without being a kind of blurry.

Results

The results trully improve on the validation dataset here.

Samples:

Input Output Target

Here you can see the curve of the IoU metric function of the threshold for the generated FOD:

And here, the same curve but for the :

Conclusion

As we can see, it is trully the quality of the FOD generated that impact on the real FOD detection.

Next step

I will try to see if replacing the siamois preprocessing model by just an absolute difference of the two images may improve the results.

experience_8

Information

New Diff2Mask model, I now use absolute difference on the two inputs, and I drop the idea to use siamois model to 'preprocess' the two images (for example, remove the lightning conditions differences, etc)

Results

The results are extremely good here.

Samples:

Input 1 Input 2 AbsDiff Output Binarised output

You can see that the model is very successful to detect the real FOD.

The second row show that the model also interprets sometimes spotlights as FOD, but this problem may be easily corrected by changing the model learning to output three classes: FODs, spotlights and whitelines. And so, the model should be able to differentiate FODs from spotlights.

I also try to use a simple model with only a dozen or so convolutional layers, but it terrebly failed, you can see result here.

Conclusion

At this stage, it seems to work very wells. The next step may be to integrate it to the current pipeline.